智能论文笔记

Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis

Alexandra Vioni , Myrsini Christidou , Nikolaos Ellinas , Georgios Vamvoukakis , Panos Kakoulidis , Taehoon Kim , June Sig Sung , Hyoungmin Park , Aimilios Chalamandaris , Pirros Tsiakoulis

分类：自然语言处理 | 机器学习

2021-11-19

本文介绍了一种在自回归关注文本到语音系统中控制音素级别的韵律的方法。除了通常完成的常见框架中，我们将从培训集中的语音数据中直接提取音素级F0和持续时间特征，而不是学习潜在韵律特征。每个韵律特征是使用无监督聚类离散化，以便为每个话语产生一系列韵律标签。该序列与音素序列并行使用，以便通过利用韵律编码器和相应的注意模块来调节解码器。实验结果表明，该方法保留了高质量的生成语音，同时允许对F0和持续时间进行音素级控制。通过用音符替换F0集群质心，该模型还可以在扬声器范围内提供对音符和八度音的控制。

translated by 谷歌翻译

Word-Level Style Control for Expressive, Non-attentive Speech Synthesis

Konstantinos Klapsas , Nikolaos Ellinas , June Sig Sung , Hyoungmin Park , Spyros Raptis

分类：自然语言处理 | 机器学习

2021-11-19

本文提出了一种表达语音合成架构，用于在单词级别建模和控制说话方式。它试图借助两个编码器来学习语音数据的单词级风格和韵律表示。通过查找声学特征的每个单词的样式令牌的组合，第二个模型样式，第二个输出单词级序列仅在语音信息上调节，以便从风格信息解开它。两个编码器输出与音素编码器输出对齐并连接，然后用非周度塔歇尔策略模型解码。额外的先前编码器用于自向预测样式标记，以便模型能够在没有参考话语的情况下运行。我们发现所产生的模型给出了对样式的单词级和全局控制，以及韵律转移能力。

translated by 谷歌翻译

Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control

Myrsini Christidou , Alexandra Vioni , Nikolaos Ellinas , Georgios Vamvoukakis , Konstantinos Markopoulos , Panos Kakoulidis , June Sig Sung , Hyoungmin Park , Aimilios Chalamandaris , Pirros Tsiakoulis

分类：自然语言处理 | 机器学习

2021-11-19

本文介绍了对F0的音素级韵律控制的方法和多销箱文本到语音设置的持续时间，基于韵律聚类。使用自回归关注的模型，并将多个箱子架构模块并联，与韵律编码器并联。提出了对基本单扬声器方法的几种改进，从而增加了韵律控制范围和覆盖范围。更具体地说，我们采用数据增强，F0标准化，持续时间的平衡集群，以及扬声器无关的韵律聚类。这些修改使培训集中包含的所有发言者能够进行细粒度的音素级韵律控制，同时保持扬声器标识。该模型也可以微调到具有限制数据量的看不见的扬声器，并显示其维持其韵律控制能力，验证说话者无关的韵律聚类是有效的。实验结果验证了该模型维持了高输出语音质量，并且该方法允许在每个扬声器范围内有效的韵律控制，尽管多种式箱子设置介绍的变化。

translated by 谷歌翻译

Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control

Konstantinos Markopoulos , Nikolaos Ellinas , Alexandra Vioni , Myrsini Christidou , Panos Kakoulidis , Georgios Vamvoukakis , Georgia Maniati , June Sig Sung , Hyoungmin Park , Pirros Tsiakoulis

分类：自然语言处理 | 机器学习

2021-11-17

在本文中，介绍了文本到读取/唱歌系统，可以适应任何扬声器的声音。它利用基于TacoTron的多级箱子声学模型在只读语音数据训练，并且在音素级别提供韵律控制。还研究了基于传统DSP算法的数据集增强和额外的韵律操纵。神经TTS模型对看不见的扬声器的有限录音进行了微调，允许与目标的扬声器语音进行敲击/歌唱合成。描述了系统的详细管道，其包括从Capella歌曲的目标音调和持续时间值提取，并将其转换为在合成之前的目标扬声器的有效音符范围内。还研究了通过WSOLA输出的输出的韵律操纵的另外的阶段，以便更好地匹配目标持续时间值。合成的话语可以与乐器伴奏轨道混合以产生完整的歌曲。通过主观聆听测试评估所提出的系统，以及与可用的备用系统相比，该系统还旨在从只读训练数据产生合成歌唱语音。结果表明，该拟议的方法可以产生高质量的敲击/歌声，具有增加的自然。

translated by 谷歌翻译

Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

Georgia Maniati , Nikolaos Ellinas , Konstantinos Markopoulos , Georgios Vamvoukakis , June Sig Sung , Hyoungmin Park , Aimilios Chalamandaris , Pirros Tsiakoulis

分类：自然语言处理 | 机器学习

2021-11-17

最近最近提出了使用音韵特征而不是音素作为输入到序列TTS的输入，用于零拍摄的多语言语音合成。这种方法对于代码切换是有用的，因为它促进了嵌入在本机的流中的外语的无缝发出。在我们的工作中，我们培训了一种语言 - 无人物多相箱模型，在不同语言中常见的一组音牙衍生特征上，其目标是实现交叉语言扬声器适应。我们首先尝试语言语音相似性对几种源语言组合的交叉语言的影响。随后，我们可以在看见或一个看不见的语言中使用非常有限的新扬声器语音数据进行微调，并实现了相同质量的合成语音，同时保留了目标扬声器的身份。随着目标扬声器数据的32和8个话语，我们获得高扬声器相似性分数和与相应文献相当的自然。在仅为2种可用的适应话语的极端情况下，我们发现我们的模型表现为几滴学习者，因为在所见和看不见的语言方案中的性能相似。

translated by 谷歌翻译

High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

Nikolaos Ellinas , Georgios Vamvoukakis , Konstantinos Markopoulos , Aimilios Chalamandaris , Georgia Maniati , Panos Kakoulidis , Spyros Raptis , June Sig Sung , Hyoungmin Park , Pirros Tsiakoulis

分类：自然语言处理 | 机器学习

2021-11-17

本文介绍了一个端到端的文本到语音系统，CPU延迟低，适用于实时应用。该系统由基于自回归关注的序列到序列声学模型和用于波形生成的LPCNet声码器组成。提出了一种采用塔克罗伦1和2型号的模块的声学模型架构，而通过使用最近提出的基于位置的注意机制来确保稳定性，适用于任意句子长度。在推断期间，解码器是展开的，并且以流式方式执行声学特征生成，允许与句子长度无关的几乎恒定的延迟。实验结果表明，声学模型可以产生比计算机CPU上的实时大约31倍的功能序列，移动CPU上的6.5倍，使其能够满足两个设备上实时应用所需的条件。全端到端系统可以通过听证测试来验证几乎是自然的质量语音。

translated by 谷歌翻译

Segmentation based tracking of cells in 2D+time microscopy images of macrophages

Seol Ah Park , Tamara Sipka , Zuzana Kriva , George Lutfalla , Mai Nguyen-Chi , Karol Mikula

分类：计算机视觉

2023-01-02

The automated segmentation and tracking of macrophages during their migration are challenging tasks due to their dynamically changing shapes and motions. This paper proposes a new algorithm to achieve automatic cell tracking in time-lapse microscopy macrophage data. First, we design a segmentation method employing space-time filtering, local Otsu's thresholding, and the SUBSURF (subjective surface segmentation) method. Next, the partial trajectories for cells overlapping in the temporal direction are extracted in the segmented images. Finally, the extracted trajectories are linked by considering their direction of movement. The segmented images and the obtained trajectories from the proposed method are compared with those of the semi-automatic segmentation and manual tracking. The proposed tracking achieved 97.4% of accuracy for macrophage data under challenging situations, feeble fluorescent intensity, irregular shapes, and motion of macrophages. We expect that the automatically extracted trajectories of macrophages can provide pieces of evidence of how macrophages migrate depending on their polarization modes in the situation, such as during wound healing.

translated by 谷歌翻译

DMOps: Data Management Operation and Recipes

Eujeong Choi , Chanjun Park

分类：机器学习

2023-01-02

Data-centric AI has shed light on the significance of data within the machine learning (ML) pipeline. Acknowledging its importance, various research and policies are suggested by academia, industry, and government departments. Although the capability of utilizing existing data is essential, the capability to build a dataset has become more important than ever. In consideration of this trend, we propose a "Data Management Operation and Recipes" that will guide the industry regardless of the task or domain. In other words, this paper presents the concept of DMOps derived from real-world experience. By offering a baseline for building data, we want to help the industry streamline its data operation optimally.

translated by 谷歌翻译

Situation-Aware Deep Reinforcement Learning for Autonomous Nonlinear Mobility Control in Cyber-Physical Loitering Munition Systems

Hyunsoo Lee , Soohyun Park , Won Joon Yun , Soyi Jung , Joongheon Kim

分类：机器人

2022-12-31

According to the rapid development of drone technologies, drones are widely used in many applications including military domains. In this paper, a novel situation-aware DRL- based autonomous nonlinear drone mobility control algorithm in cyber-physical loitering munition applications. On the battlefield, the design of DRL-based autonomous control algorithm is not straightforward because real-world data gathering is generally not available. Therefore, the approach in this paper is that cyber-physical virtual environment is constructed with Unity environment. Based on the virtual cyber-physical battlefield scenarios, a DRL-based automated nonlinear drone mobility control algorithm can be designed, evaluated, and visualized. Moreover, many obstacles exist which is harmful for linear trajectory control in real-world battlefield scenarios. Thus, our proposed autonomous nonlinear drone mobility control algorithm utilizes situation-aware components those are implemented with a Raycast function in Unity virtual scenarios. Based on the gathered situation-aware information, the drone can autonomously and nonlinearly adjust its trajectory during flight. Therefore, this approach is obviously beneficial for avoiding obstacles in obstacle-deployed battlefields. Our visualization-based performance evaluation shows that the proposed algorithm is superior from the other linear mobility control algorithms.

translated by 谷歌翻译

Macro-block dropout for improved regularization in training end-to-end speech recognition models

Chanwoo Kim , Sathish Indurti , Jinhwan Park , Wonyong Sung

分类：机器学习 | 自然语言处理

2022-12-29

This paper proposes a new regularization algorithm referred to as macro-block dropout. The overfitting issue has been a difficult problem in training large neural network models. The dropout technique has proven to be simple yet very effective for regularization by preventing complex co-adaptations during training. In our work, we define a macro-block that contains a large number of units from the input to a Recurrent Neural Network (RNN). Rather than applying dropout to each unit, we apply random dropout to each macro-block. This algorithm has the effect of applying different drop out rates for each layer even if we keep a constant average dropout rate, which has better regularization effects. In our experiments using Recurrent Neural Network-Transducer (RNN-T), this algorithm shows relatively 4.30 % and 6.13 % Word Error Rates (WERs) improvement over the conventional dropout on LibriSpeech test-clean and test-other. With an Attention-based Encoder-Decoder (AED) model, this algorithm shows relatively 4.36 % and 5.85 % WERs improvement over the conventional dropout on the same test sets.

translated by 谷歌翻译